Skip to content

3625: Add decompression size limit to prevent decompression bomb DoS#28

Open
martin-augment wants to merge 1 commit intomainfrom
pr-3625-2026-01-14-10-57-05
Open

3625: Add decompression size limit to prevent decompression bomb DoS#28
martin-augment wants to merge 1 commit intomainfrom
pr-3625-2026-01-14-10-57-05

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

3625: To review by AI

…b DoS

Add maximum decompression size limit in DeflateCodec to prevent
OutOfMemoryError when processing maliciously crafted Avro files
with high compression ratios (decompression bombs).

The limit defaults to 200MB and can be configured via system property:
org.apache.avro.limits.decompress.maxLength
@github-actions github-actions bot added the Java label Jan 14, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 14, 2026

Walkthrough

The DeflateCodec.decompress method is modified to enforce a maximum decompressed size limit through a configurable system property org.apache.avro.limits.decompress.maxLength, defaulting to 200 MB. The implementation replaces InflaterOutputStream with a manual inflate loop that tracks total decompressed bytes and throws AvroRuntimeException when the limit is exceeded. DataFormatException handling is added and wrapped as IOException. A private helper method getMaxDecompressLength() is introduced to parse the property with fallback to the default value. No public API declarations are altered; only private implementation details and imports are added.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude bot commented Jan 14, 2026

Code Review: Add decompression size limit to prevent decompression bomb DoS

Summary

This PR adds important security protection against decompression bomb attacks by limiting the maximum decompressed size. The implementation is generally sound but has several issues that should be addressed.


🔴 Critical Issues

1. Resource Leak - Inflater not properly managed

Location: DeflateCodec.java:101-120

The code calls getInflater() which returns a reusable Inflater instance (line 101), but if an exception occurs during decompression, the Inflater is never reset. This can lead to:

  • Memory leaks from the native deflate buffers
  • Incorrect state in subsequent decompress() calls on the same codec instance

The original code used InflaterOutputStream which properly managed the Inflater lifecycle. The new code must ensure cleanup happens even on error.

Recommendation: Wrap the decompression logic in a try-finally block or use try-with-resources pattern if possible.

2. Integer Overflow Risk

Location: DeflateCodec.java:110

totalBytes += len;  // 'len' is int, 'totalBytes' is long

While totalBytes is a long, len is an int from inflater.inflate(buffer). In pathological cases with corrupted data, there's a theoretical risk, though unlikely in practice since the check happens immediately after.

Severity: Low (edge case)


⚠️ Major Issues

3. Inconsistent Security Across Codecs

The decompression limit is only applied to DeflateCodec, but other codecs (BZip2Codec, ZstandardCodec, XZCodec) are equally vulnerable to decompression bombs.

Recommendation: Consider applying similar protections to all compression codecs for consistency. This could be:

  • A shared utility method in the Codec base class
  • A wrapper that enforces limits for all codec types
  • At minimum, document why DeflateCodec is singled out

4. getMaxDecompressLength() Called on Every decompress()

Location: DeflateCodec.java:45-55, 96

The system property is read and parsed on every single decompress() call. This is inefficient for high-throughput scenarios.

Recommendation: Cache the value as a static final field, initialized once:

private static final long MAX_DECOMPRESS_LENGTH = initMaxDecompressLength();

private static long initMaxDecompressLength() {
  String prop = System.getProperty(MAX_DECOMPRESS_LENGTH_PROPERTY);
  if (prop != null) {
    try {
      return Long.parseLong(prop);
    } catch (NumberFormatException e) {
      // Log warning or use default silently
    }
  }
  return DEFAULT_MAX_DECOMPRESS_LENGTH;
}

💡 Minor Issues & Suggestions

5. Silent NumberFormatException Handling

Location: DeflateCodec.java:50-52

Invalid configuration values are silently ignored. Consider logging a warning so administrators know their configuration is being ignored.

6. Potential Off-by-One in Size Check

Location: DeflateCodec.java:111

if (totalBytes > maxLength) {

Should this be >= instead of >? Currently allows exactly maxLength + 1 bytes (when the last chunk pushes it over). Probably not critical but worth clarifying intent.

7. Error Message Improvement

The error message is good, but could include:

  • The actual compressed input size (for forensics)
  • A reference to documentation/security advisory

✅ Positive Aspects

  1. Good configurability - System property allows runtime tuning without code changes
  2. Reasonable default - 200MB is a sensible limit for most use cases
  3. Clear error messages - Users get actionable information when the limit is hit
  4. Follows existing patterns - Code style matches the rest of the codebase
  5. Security-focused - Addresses a real DoS vulnerability

🧪 Test Coverage

CRITICAL MISSING: No tests were added for this security feature!

Required tests:

  1. Test that limit is enforced - Create compressed data that expands beyond 200MB and verify AvroRuntimeException is thrown
  2. Test custom limit via system property - Verify configuration works
  3. Test legitimate large data - Ensure valid data under the limit still works
  4. Test edge cases - Data exactly at the limit, slightly over, etc.
  5. Test resource cleanup - Verify inflater state after failed decompression

Recommendation: Add a new test class TestDeflateCodecSecurity.java or extend TestAllCodecs.java


📋 Recommendations Summary

Must Fix (before merge):

  1. Add proper resource cleanup (try-finally for inflater reset)
  2. Add comprehensive test coverage
  3. Cache the max length value (performance)

Should Fix:
4. Document why only DeflateCodec gets this protection (or apply to all codecs)
5. Add logging for invalid configuration
6. Clarify off-by-one behavior (> vs >=)

Nice to Have:
7. Enhanced error messages with compressed size
8. Documentation/CHANGELOG entry about the new security feature and system property


Security Assessment

The core security improvement is valid and valuable
⚠️ Implementation has resource management issues that could cause instability
⚠️ Lack of tests means the protection is unverified

The PR addresses a real security concern (CVE-worthy decompression bombs), but needs the above fixes before it's production-ready.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
lang/java/avro/src/main/java/org/apache/avro/file/DeflateCodec.java (1)

42-55: Harden max-length property parsing (negative/zero + SecurityException)
getMaxDecompressLength() currently accepts negative/zero values (making behavior surprising) and can throw SecurityException under restricted environments. Consider validating > 0 and falling back safely.

Proposed diff
 private static long getMaxDecompressLength() {
-  String prop = System.getProperty(MAX_DECOMPRESS_LENGTH_PROPERTY);
-  if (prop != null) {
-    try {
-      return Long.parseLong(prop);
-    } catch (NumberFormatException e) {
-      // Use default
-    }
-  }
-  return DEFAULT_MAX_DECOMPRESS_LENGTH;
+  final String prop;
+  try {
+    prop = System.getProperty(MAX_DECOMPRESS_LENGTH_PROPERTY);
+  } catch (SecurityException e) {
+    return DEFAULT_MAX_DECOMPRESS_LENGTH;
+  }
+  if (prop == null) {
+    return DEFAULT_MAX_DECOMPRESS_LENGTH;
+  }
+  try {
+    long parsed = Long.parseLong(prop);
+    return parsed > 0 ? parsed : DEFAULT_MAX_DECOMPRESS_LENGTH;
+  } catch (NumberFormatException e) {
+    return DEFAULT_MAX_DECOMPRESS_LENGTH;
+  }
 }
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2a739f8 and 50a6bfc.

📒 Files selected for processing (1)
  • lang/java/avro/src/main/java/org/apache/avro/file/DeflateCodec.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Cursor Bugbot
🔇 Additional comments (1)
lang/java/avro/src/main/java/org/apache/avro/file/DeflateCodec.java (1)

23-29: Imports are fine; ensure the chosen exception type is intentional
Adding AvroRuntimeException means decompress() can now fail with an unchecked exception (despite throws IOException). If callers expect to handle failures via IOException only, consider throwing IOException for limit violations (or wrapping consistently at a higher layer).

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines 95 to 123
public ByteBuffer decompress(ByteBuffer data) throws IOException {
long maxLength = getMaxDecompressLength();
NonCopyingByteArrayOutputStream baos = new NonCopyingByteArrayOutputStream(DEFAULT_BUFFER_SIZE);
try (OutputStream outputStream = new InflaterOutputStream(baos, getInflater())) {
outputStream.write(data.array(), computeOffset(data), data.remaining());
byte[] buffer = new byte[DEFAULT_BUFFER_SIZE];
long totalBytes = 0;

Inflater inflater = getInflater();
inflater.setInput(data.array(), computeOffset(data), data.remaining());

try {
while (!inflater.finished()) {
int len = inflater.inflate(buffer);
if (len == 0 && inflater.needsInput()) {
break;
}
totalBytes += len;
if (totalBytes > maxLength) {
throw new AvroRuntimeException(
"Decompressed size " + totalBytes + " exceeds maximum allowed size " + maxLength
+ ". This can be configured by setting the system property " + MAX_DECOMPRESS_LENGTH_PROPERTY);
}
baos.write(buffer, 0, len);
}
} catch (DataFormatException e) {
throw new IOException("Invalid deflate data", e);
}

return baos.asByteBuffer();
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix “no progress” inflate states to avoid infinite loop / accepting truncated data
In the current loop, len == 0 can occur with needsDictionary() (infinite loop) and needsInput() while not finished() (currently break + returns partial output). These should be treated as invalid/incomplete deflate data and fail fast.

Proposed diff
   try {
     while (!inflater.finished()) {
       int len = inflater.inflate(buffer);
-      if (len == 0 && inflater.needsInput()) {
-        break;
-      }
+      if (len == 0) {
+        if (inflater.needsDictionary()) {
+          throw new IOException("Invalid deflate data (requires a preset dictionary)");
+        }
+        if (inflater.needsInput()) {
+          throw new IOException("Invalid deflate data (truncated input)");
+        }
+        // Defensive: avoid spinning if inflater makes no progress
+        throw new IOException("Invalid deflate data (stalled inflater)");
+      }
       totalBytes += len;
       if (totalBytes > maxLength) {
         throw new AvroRuntimeException(
             "Decompressed size " + totalBytes + " exceeds maximum allowed size " + maxLength
                 + ". This can be configured by setting the system property " + MAX_DECOMPRESS_LENGTH_PROPERTY);
       }
       baos.write(buffer, 0, len);
     }
   } catch (DataFormatException e) {
     throw new IOException("Invalid deflate data", e);
   }
+
+  if (!inflater.finished()) {
+    throw new IOException("Invalid deflate data (did not reach stream end)");
+  }

   return baos.asByteBuffer();
 }

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The CodeRabbit AI reviewer is correct! Returning 0 inflated bytes needs more care to avoid infinite loop or unfinished reads. Both would be very problematic for the user applications. An infinite loop would lead to high consumption of a CPU core. Reading less data would lead to broken compressed data.

@augmentcode
Copy link
Copy Markdown

augmentcode bot commented Jan 14, 2026

🤖 Augment PR Summary

Summary: This PR hardens Java DEFLATE decompression against decompression-bomb DoS by enforcing a maximum decompressed output size.

Changes:

  • Adds a configurable max decompressed length via system property org.apache.avro.limits.decompress.maxLength (default 200MB).
  • Reworks DeflateCodec.decompress() to inflate in a loop and count produced bytes.
  • Throws when the produced output exceeds the configured limit, preventing unbounded memory growth.
  • Maps invalid deflate streams (DataFormatException) to IOException.

Technical Notes: The new logic replaces InflaterOutputStream with manual Inflater management to track output size precisely.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

try {
while (!inflater.finished()) {
int len = inflater.inflate(buffer);
if (len == 0 && inflater.needsInput()) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inflate() can return 0 when the stream isn’t finished and it doesn’t need more input (e.g., needsDictionary()), which would make this loop spin forever; also, breaking on needsInput() while !finished() risks returning truncated output without error. Consider explicitly handling the len == 0 cases (dictionary/truncated data) to avoid hangs and silent corruption.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Augment AI reviewer is correct! Returning 0 inflated bytes needs more care to avoid infinite loop or unfinished reads. Both would be very problematic for the user applications. An infinite loop would lead to high consumption of a CPU core. Reading less data would lead to broken compressed data.

}
totalBytes += len;
if (totalBytes > maxLength) {
throw new AvroRuntimeException(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This limit breach throws AvroRuntimeException even though decompress() is declared throws IOException; callers that currently catch IOException for decompression failures may miss this and crash. Consider using an IOException subclass (or otherwise documenting this behavior change) so error handling remains consistent.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:bug; feedback:The Augment AI reviewer is correct! The method signature declares that it would throw an IOException and this is what its callers would expect. Throwing a AvroRuntimeException may not be handled by the caller and lead to internal errors in the application.

@martin-augment
Copy link
Copy Markdown
Owner Author

1. Resource Leak - Inflater not properly managed

Location: DeflateCodec.java:101-120

The code calls getInflater() which returns a reusable Inflater instance (line 101), but if an exception occurs during decompression, the Inflater is never reset. This can lead to:

  • Memory leaks from the native deflate buffers
  • Incorrect state in subsequent decompress() calls on the same codec instance

The original code used InflaterOutputStream which properly managed the Inflater lifecycle. The new code must ensure cleanup happens even on error.

Recommendation: Wrap the decompression logic in a try-finally block or use try-with-resources pattern if possible.

value:annoying; category:bug; feedback:The Claude AI reviewer is not correct! getInflater() method either creates a new instance of resets the reused instance, so it is not possible to use a "dirty" Inflater.

@martin-augment
Copy link
Copy Markdown
Owner Author

42-55: Harden max-length property parsing (negative/zero + SecurityException)
getMaxDecompressLength() currently accepts negative/zero values (making behavior surprising) and can throw SecurityException under restricted environments. Consider validating > 0 and falling back safely.

value:good-to-have; category:bug; feedback:The CodeRabbit AI reviewer is correct! Misconfiguration may lead to parsing a negative value for the MAX_DECOMPRESS_LENGTH and this will prevent any usage of the DeflateCodec. Prevents finding an issue too late (at runtime instead of at start time).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants